Identification of consensus patterns in unaligned DNA sequences known to be functionally related

نویسندگان

  • G. Z. Hertz
  • G. W. Hartzell
  • Gary D. Stormo
چکیده

We have developed a method for identifying consensus patterns in a set of unaligned DNA sequences known to bind a common protein or to have some other common biochemical function. The method is based on a matrix representation of binding site patterns. Each row of the matrix represents one of the four possible bases, each column represents one of the positions of the binding site and each element is determined by the frequency the indicated base occurs at the indicated position. The goal of the method is to find the most significant matrix--i.e. the one with the lowest probability of occurring by chance--out of all the matrices that can be formed from the set of related sequences. The reliability of the method improves with the number of sequences, while the time required increases only linearly with the number of sequences. To test this method, we analysed 11 DNA sequences containing promoters regulated by the Escherichia coli LexA protein. The matrices we found were consistent with the known consensus sequence, and could distinguish the generally accepted LexA binding sites from other DNA sequences.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-alphabet consensus algorithm for identification of low specificity protein-DNA interactions.

A method for the identification and characterization of protein-DNA interactions is presented. We have developed an approach for finding unknown multiple patterns that occur imperfectly in a set of several sequences. The pattern may contain letters from the nucleotide alphabet (A, C, G and T) including ambiguous characters (A/C, A/G, A/T; A/C/G, etc.). This method reveals weak DNA signals on an...

متن کامل

Learning Consensus Patterns in Unaligned DNA Sequences Using a Genetic Algorithm

We use a oating point GA to learn a classiication rule which discriminates a set of related, unaligned DNA sequences known to contain a biological signal from other sequences which do not contain the signal. The classiica-tion rule learned by the GA is in the form of a DNA speciicity matrix with a xed threshold. We translate the matrix into a consensus pattern by using the matrix to align the p...

متن کامل

Recognition of multiple patterns in unaligned sets of sequences. Comparison of kernel clustering method with other methods

MOTIVATION Transcription factor binding sites often differ significantly in their primary sequence and can hardly be aligned. Often one set of sites can contain several subsets of sequences that follow not just one but several different patterns. There is a need for sensitive methods to reveal multiple patterns in unaligned sets of sequences. RESULTS We developed a novel method for analysis o...

متن کامل

Subtle Signal Discoveries in Unaligned Molecular Sequences Using Self-Organizing Neural Networks

In this paper, we study the problem of subtle signal discoveries in unaligned DNA and protein sequences. Motifs, also known as approximate common substrings, are good examples of subtle signals in DNA and protein sequences. The problem of motif identification in DNA and protein sequences has been studied for many years in the literature. Major hurdles at this point include computational complex...

متن کامل

Motif discoveries in unaligned molecular sequences using self-organizing neural networks

In this paper, we study the problem of motif discoveries in unaligned DNA and protein sequences. The problem of motif identification in DNA and protein sequences has been studied for many years in the literature. Major hurdles at this point include computational complexity and reliability of the search algorithms. We propose a self-organizing neural network structure for solving the problem of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Computer applications in the biosciences : CABIOS

دوره 6 2  شماره 

صفحات  -

تاریخ انتشار 1990